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Abstract 

An  urban-oriented  emergency  assessment  system  for 
airborne  Chemical,  Biological,  and  Radiological  (CBR) 
threats,  called  CT-Analysr  and  based  on  new  principles, 
gives  greater  accuracy  and  much  greater  speed  than 
possible  with  current  alternatives.  The  increased 
accuracy  derives  from  detailed,  three-dimensional  (3D) 
computational  fluid  dynamics  (CFD)  computations 
including,  solar  heating,  buoyancy,  complete  building 
geometry  specification,  trees,  and  wind  fluctuations.  A 
limited  number  of  such  detailed  high  performance 
computing  (HPC)  computations  for  a given  area  can  be 
extended  to  all  wind  directions  and  speeds,  and  all  likely 
sources  and  source  locations  using  a new  data  structure 
called  Dispersion  Nomografs™.  By  performing  all  the 
heavy  computing  ahead  of  time  using  the  full  power  of 
HPC  parallel  platforms  well  suited  to  the  application,  the 
results  of  a number  of  complete,  high-resolution  3D 
simulations  can  be  recalled  for  operational  usage  with  no 
sensible  delay  for  integration  of  even  simple  models.  In 
this  way,  we  have  solved  the  usual  dilemma  of  more 
computer  time  being  required  to  obtain  better  answers. 
The  best  available  answers  can  be  presented  instantly 
with  full  urban  geometry  in  a readily  comprehended 
format. 

1.  Introduction 

The  emergence  of  increasingly  powerful  computers 
stimulated  the  development  of  obstacle-resolving  micro- 
scale flow  and  transport  models  based  on  CFD.  In  recent 
years,  these  types  of  models  are  playing  an  important  role 
in  many  applications.  They  serve  as  general  tools  in  fluid 
engineering  and  wind  engineering  when  complex  flow 
systems  have  to  be  designed.  Under  the  general  category 
of  Urban  Aerodynamics1'1,  these  models  are  now 
commonly  applied  to  predict  contaminant  transport  (CT) 
in  complex  structured  urban  landscapes.  Use  of  such 


models  is  made  in  the  licensing  of  new  industrial  plants, 
in  safety  analysis  studies  for  accidental  releases  of 
hazardous  materials  in  the  chemical  industry,  or  in  the 
context  of  crisis  management  after  terrorist  attacks  or 
accidents  in  urban  environments. 

Urban  airflow  accompanied  by  CT  presents  new, 
extremely  challenging  modeling  requirements121  best  met 
using  complex-geometry  simulation  tools  developed  by 
the  aerospace  industry.  Configurations  with  complex 
geometries  and  unsteady  buoyant  flow  physics  are 
involved.  The  wide  range  of  temporal  and  spatial  scales 
rapidly  overwhelms  the  current  modeling  capacities. 
Crucial  technical  issues  that  need  to  be  addressed  include 
time-dependent  turbulent  fluid  transport  (aerodynamics), 
environmental  boundary  condition  modeling 
(meteorology),  and  the  practical  post-processing  of  the 
simulation  results  for  use  by  responders  in  actual 
emergencies.  The  advantages  of  the  CFD  approach  and 
the  large  eddy  simulation  (LES)  representation  include 
the  ability  to  quantify  complex  geometry  effects,  to 
predict  dynamic  nonlinear  processes  faithfully,  and  to 
treat  turbulent  problems  reliably  in  regimes  where 
experiments,  and  therefore  model  validations,  are 
impossible  or  impractical. 

CFD  solutions  to  CT  can  be  highly  accurate,  but  are 
too  slow  for  emergency  response  purposes.  One  practical 
solution  to  this  critical  dilemma  carries  out  the  unsteady 
CFD  simulations  in  advance  and  pre-computes 
compressed  databases  for  specific  urban  areas 
incorporating  suitably  parameterized  weather  for  a full  set 
of  wind  conditions  and  distributed  test-sources.  The 
relevant  information  is  summarized  as  Dispersion 
NomografM  datasets131  so  that  it  can  be  used  in  a portable 
system  called  Contaminant  Transport  Analyst  (CT- 
Analyst®)141  that  reproduces  the  CFD  quality-results 
nearly  instantaneously  with  little  loss  of  fidelity. 

This  paper  presents  this  new  methodology  that  brings 
the  fidelity  and  accuracy  of  CFD  to  the  first  responder  or 
warfighter  at  speed  necessary  for  emergency  response.  It. 
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presents  an  overview  of  the  issues  involved  in  meeting 
these  seemingly  contradictory  requirements 

A.  Standard  CFD  Simulations 

Some  “time-accurate”  flow  simulations  that  attempt 
to  capture  the  urban  geometry  and  fluid  dynamic  details 
are  a direct  application  of  standard  (aerodynamic)  CFD 
methodology  to  the  urban  scale  problem.  An  example  is 
the  finite  element  CFD  simulations  of  the  dispersion  of  a 
contaminant  in  the  Atlanta,  Georgia  metropolitan  area  5'. 
The  model  includes  topology  and  terrain  data  and  a 
typical  mesh  contains  approximately  200  million  nodes 
and  55  million  tetrahedral  elements.  These  are  grand- 
challenge  size  calculations  and  are  run  on  1,024 
processors  of  a Cray  T3E  taking  up  to  a whole  day  to  run. 
Similar  approaches  are  being  used  by  other  research 
groups'6,7'.  The  chief  difficulties  with  this  approach  for 
large  urban  regions  are  that  the  solutions  are  very 
computer  intensive  (days  or  weeks)  and  involve  severe 
overhead  associated  with  mesh  generation. 

B.  The  LES  Approach  for  Contaminant 
Transport 

Direct  numerical  simulation  (DNS)  is  prohibitively 
expensive  for  most  practical  flows  at  moderate-to-high 
Reynolds  number,  and  especially  so  for  urban  CT  studies. 
On  the  other  end  of  the  CFD  spectrum  are  the  classic 
aerodynamic  methods  such  as  the  Reynolds-Averaged 
Navier-Stokes  (RANS)  approach,  which  simulate  the 
mean  flow  and  approximately  model  the  effects  of 
fluctuating  scales18'.  These  approaches  are  typically 
unacceptable  for  urban  CT  modeling  because  they  are 
unable  to  capture  the  inherently  unsteady  but  coherent 
plume  dynamics  driven  by  the  urban  geometry.  Large 
eddy  simulation  (LES)  constitutes  an  effective 
intermediate  approach  between  DNS  and  the  RANS 
methods1'1.  LES  is  capable  of  simulating  flow  features 
that  cannot  be  handled  with  RANS  such  as  significant 
flow  unsteadiness  and  localized  vortex  shedding,  and 
provides  higher  accuracy  than  the  industrial  methods,  but 
at  a lower  cost  than  DNS.  LES  solutions  converge  to  the 
solutions  of  the  Navier-Stokes  equations  as  resolution  is 
increased,  whereas  RANS  generally  do  not.  Because  the 
larger-scale  unsteady  features  of  the  flow  govern  the 
unsteady  plume  dynamics  in  urban  geometries,  therefore, 
the  LES  approximation  can  capture  key  features  which 
the  RANS  methods  and  the  various  Gaussian  plume 
methodologies  cannot.  Moreover,  given  its  potential  for 
higher  computational  efficiency,  the  Monotone  Integrated 
LES  (MILES)  approach  (see  Reference  10  for  a recent 
review)  is  well  suited  for  CFD-based  plume  simulation 


for  urban-scale  scenarios,  an  application  where  classical 
LES  methods  are  expensive. 

A practical  example  of  urban-scale  MILES  is 
depicted  in  Figure  1 which  shows  contaminant  dispersion 
in  Times  Square,  New  York  City.  The  figure 
demonstrates  the  typical  complex  unsteady  vertical 
mixing  patterns  caused  by  building  vortex  and 
recirculation  patterns,  and  the  predicted  associated 
endangered  region  associated  with  this  particular  release 
scenario.  The  large  variability  of  concentration  values 
from  minute  to  minute  is  evident  and  thus  the  need  for 
unsteady,  time-dependent  simulation  models. 

C.  The  FAST3D-CT  Model 

The  FAST3D-CT  3D  urban  aerodynamics 
model'3,11'12'  is  based  on  the  scalable,  low  dissipation 
Flux-Corrected  Transport  (FCT)  convection 
algorithm'13,14'.  FCT  is  a high-order,  monotone, 
positivity-preserving  method  for  solving  generalized 
continuity  equations  with  source  terms.  The  version  of 
the  convection  algorithm  implemented  in  FAST3D-CT  is 
documented  in  References  15  and  16.  Relevant  physical 
processes  simulated  in  FAST3D-CT  include  complex 
building  vortex  shedding,  flows  in  recirculation  zones, 
and  approximating  the  dynamic  subgrid-scale  turbulent 
and  stochastic  backscatter.  The  model  also  incorporates  a 
stratified  urban  boundary  layer  with  realistic  wind 
fluctuations,  solar  heating  including  shadows  from 
buildings  and  trees,  aerodynamic  drag  and  heat  losses  due 
to  the  presence  of  trees,  surface  heat  variations  and 
turbulent  heat  transport. 

Modeling  a pollutant  as  well  mixed  globally  is 
typically  not  appropriate  in  problems  where  short  time 
spans  and  large  air  volumes  are  involved.  It  is  important 
to  capture  the  effects  of  unsteady,  buoyant  flow  on  the 
evolving  pollutant  concentration  distributions.  In  typical 
urban  scenarios,  both  particulate  and  gaseous 
contaminants  behave  similarly  insofar  as  transport  and 
dispersion  are  concerned,  so  that  the  contaminant  spread 
can  usually  be  simulated  effectively  based  on  appropriate 
pollutant  tracers  with  suitable  sources  and  sinks.  In  other 
cases,  the  full  details  of  multigroup  particle  distributions 
are  required.  Additional  physics  include  multi-group 
droplet  and  particle  distributions  with  turbulent  transport 
to  surfaces  as  well  as  gravitational  settling,  solar  chemical 
degradation,  evaporation  of  airborne  droplets,  relofting  of 
particles  on  the  ground  and  ground  evaporation  of  liquids. 
Details  of  the  physical  models  in  FAST3D-CT  are  given 
in  Reference  16  and  omitted  here  for  brevity.  The 
primary  difficulty  is  the  effective  calibration  and 
validation  of  all  these  physical  models  since  much  of  the 
input  needed  from  field  measurements  of  these  processes 
is  typically  insufficient  or  even  nonexistent. 
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2.  Fast  and  Accurate  CBR  Defense 

A typical  run  with  the  FAST3D-CT  model  for  a 
complex  urban  area  of  30  square  km  resolved  with  6 m 
cells  takes  24  hours  on  a 16-processor  SGI  computer 
system.  This  is  significantly  faster  per  square  km  than 
classical  CFD  models  due  to  the  savings  achieved  by 
MILES  and  other  algorithmic  improvements.  The  critical 
dilemma  in  the  CT  application  is  that  unsteady  urban- 
scenario  flow  simulations  are  currently  feasible — but  they 
are  still  expensive  and  require  a degree  of  expertise  to 
perform.  Troops  in  the  field,  first  responders,  and 
emergency  managers  on  site  to  cope  with  contaminant 
release  threats  have  perhaps  a minute  to  make  decisions 
and  cannot  afford  to  wait  while  actual  simulations  and 
data  post-processing  are  carried  out  either  locally  or 
remotely. 

An  operational  solution  of  this  dilemma  carries  out 
unsteady  CFD  simulations  in  advance  and  pre-computes 
compressed  databases  for  specific  urban  areas 
incorporating  relevant  meteorology  and  a full  set  of  wind 
conditions  and  distributed  test-sources.  The  relevant 
information  is  summarized  as  Dispersion  Nomograf  ™ 
datasets131  so  that  it  can  be  directly  applied  locally  on 
portable  computers  with  sensors  and  verbal  reports 
providing  current  information  regarding  local  presence  of 
contaminants,  contaminant  concentrations,  and  winds. 
Thus  there  is  now  a methodology  making  3D  CFD  really 
useful  for  crisis  managers  in  real  time,  operational 
situations.  The  accuracy  of  CFD  simulations  is  recovered 
nearly  instantly  with  little  loss  of  fidelity.  The  current 
implementation  of  this  new  approach  is  called  CT- 
Analyst®[4].  Near  instantaneous  CT  assessment  with  high- 
fidelity  can  reduce  the  number  of  people  being  exposed  in 
urban  areas,  even  for  large  crowds  out  in  the  open,  by  up 
to  a factor  of  six  once  a simple  sensor  or  reporting 
network  is  in  place.  First  responders  and  headquarters 
staff  can  use  CT- Analyst  displays  for  data  fusion  to  give  a 
minute-by-minute  situation  assessment. 

A.  Nomograf  Description 

Nomografs™  are  new,  compact,  pre-computed  data 
structures  that  capture  the  aerodynamic  and  turbulent 
effects  of  terrain,  buildings,  vegetation  and  surface  types 
on  contaminant  plume  transport  and  dispersion.  Using 
nomografs,  improved  accuracy  and  much  greater  speed 
are  achieved  for  urban-oriented  emergency  assessment. 
By  interpolating  into  these  patented  data  structures,  we 
can  perform  plume  predictions  in  complex  geometry  and 
related  assessments  in  milliseconds  for  wide  areas  with 
complex  terrain  such  as  cities,  military  bases,  and 
important  facilities. 


The  Naval  Research  Laboratory’s  (NRL’s)  FAST3D- 
CT  CFD  model,  as  described  above,  underpins  our  current 
implementation  of  dispersion  nomografs.  FAST3D-CT 
computes  the  multi-gigabyte  3D,  contaminant  flow-path 
databases  from  which  the  high-resolution  dispersion 
nomografs  are  extracted.  Other  detailed  models  that  can 
provide  the  same  database  could  also  be  the  source  of  data 
to  build  nomografs.  If  enough  data  were  taken  in  field, 
trials  or  experiments,  equivalent  to  three-dimensional 
fields  of  key  variables  over  the  region,  nomografs  could 
then  be  made  from  field  data[3l 

The  four  steps  in  generating  and  using  dispersion 
nomografs  are: 

1 . An  accurate  geometry  database  is  compiled  from 
light  detection  and  ranging  (LIDAR),  stereo 
imagery,  or  shape  files.  The  geometry  database 
used  by  FAST3D-CT  is  a two-dimensional 
(typically  one  meter  resolution)  array  that  returns 
the  heights  of  terrain,  buildings,  and  trees,  and 
surface  composition  in  the  computational 
domain. 

2.  Detailed  3D  computational  fluid  dynamics 
calculations  (FAST3D-CT)  are  repeated  for  18 
wind  directions  for  the  specified  geometry  and 
the  results  are  captured  in  an  extensive  database. 
These  simulations  include  the  appropriate  urban 
boundary  layer  for  the  region  with  realistic 
turbulent  fluctuations  imposed  at  the  inflow 
boundaries.  Multiple  releases  are  tracked  in  each 
case  as  described  above. 

3.  The  salient  features  from  the  CFD  database  are 
distilled  into  Dispersion  Nomograf  data 
structures  for  rapid  interactive  access.  Time 
integration  is  thus  replaced  by  interpolations  that 
capture  the  aerodynamic  effects  of  the  full  urban 
geometry  through  the  Nomograf  tables. 

4.  The  Nomograf  tables  are  encrypted  and  input  to 
CT- Analyst,  an  easy-to-use  graphical  user 
interface  for  instantaneous  situational  analysis. 
Plume  computation,  for  example,  takes  less  than 
50  milliseconds. 

B.  The  CT-Analyst®  Emergency  Assessment  Tool 

To  solve  the  critical  dilemma  and  to  meet  real 
operational  requirements,  NRL  developed  an  integrated 
CBR  emergency  assessment  tool  that  is  much  faster  than 
current  “common  use”  models  while  being  more  accurate. 
The  focus  is  on  situation  assessment  through  sensor 
fusion  of  qualitative  and  incomplete  data.  A terrorist 
probably  will  not  tell  us  the  amount  and  location  of  an 
agent  source  or  even  what  the  agent  is.  Therefore  we 
should  not  expect  this  information  early  enough  for  action 
in  a crisis  unless  we  somehow  can  generate  what  we  need 
from  the  hints  that  will  be  available.  The  only  existing 
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software  tool  with  these  capabilities  is  called  CT-Analyst 
and  is  both  zero-latency  (meaning  nearly  zero  computing 
delay)  and  high  fidelity.  CT-Analyst  is  entirely  visual, 
i.e.,  “point-and-click,”  in  application.  Beta-test  versions, 
implemented  in  modest  laptop  and  workstation  versions, 
treating  all  of  the  buildings  and  structures  in  a multiple- 
square-mile  area  of  downtown,  has  been  delivered  to  the 
cities  of  Chicago,  New  York,  Houston,  Washington  DC, 
and  to  other  officials  in  the  Department  of  Defense.  A 
corresponding  capability  has  been  delivered  to  civil 
emergency-management  authorities  in  the  District  of 
Columbia.  The  Missile  Defense  Agency  has  incorporated 
CT-Analyst  into  its  Post  Engagement  Ground  Effects 
Model  ,IS|  and  a commercial  implementation  for  law 
enforcement  is  marketed  by  Defense  Group 

Incorporated'191. 

Each  point  in  an  urban  area,  if  considered  as  a source 
location,  has  a downwind  region  called  the  footprint  that 
can  become  contaminated  by  an  airborne  agent  reaching 
that  source  point.  Any  selected  location  (considered  as  a 
site  of  interest)  also  has  an  upwind  region  (the  danger 
zone)  within  which  contaminant  would  have  to  be 
released  to  reach  and  contaminate  that  site.  These  two 
classes  of  regions  are  completely  complementary,  being 
effectively  each  other’s  inverse.  All  assessments  in  CT- 
Analyst  are  “computed”  by  manipulating  these  two 
distinct  regions  for  sensor  report  locations,  for  selected 
site  locations,  and  for  source  locations.  The  dispersion 
nomograf  representation  is  designed  to  make  these 
manipulations  very  fast  while  requiring  only  a minimum 
amount  of  tabulated  data  for  each  wind  direction.  The 
dispersion  nomograf  representation  and  processing 
algorithms  also  allow  some  new  features.  Multiple  sensor 
fusion  for  instantaneous  situation  assessment  is  an 
automatic  consequence  of  the  nomograf  representation. 
The  methodology  can  accept  qualitative  and  anecdotal 
input  and  does  not  require  knowledge  of  a source  location 
or  even  a source  type  or  amount.  A backtrack  to 
unknown  source  locations  is  performed  graphically  with 
zero  delay  by  overlap  operations  on  the  upwind  danger 
zones  of  the  “hot”  and  “cold”  sensor  reports. 

Figure  2 shows  a typical  CT-Analyst  display  for  an 
urban  area,  in  this  case  a section  of  downtown  Houston. 
The  backtrack  to  the  source  based  solely  on  sensor  reports 
is  shown  in  dark  blue.  Star-shaped  nodes  are  sources, 
triangular  and  circular  nodes  are  sensor  reports,  and 
square  nodes  indicate  specific  sites.  When  a source  node 
is  active  it  is  colored  light  blue,  as  shown  above. 
Footprints,  plume  envelopes,  contaminant  concentration 
plots,  and  escape  routes  can  be  displayed  for  sources  by 
activating  buttons  on  the  lower  portion  of  the  CT-Analyst 
screen.  Triangular  sensor  report  nodes  inside  an  active 
plume  envelope  are  “hot”  (red)  while  those  still 
uncontaminated  are  “cold”  (blue).  Downwind 
consequence  regions  (for  active  “hot”  reports)  and 


upwind  backtrack  estimates  (for  all  active  “hot”  and 
“cold”  reports)  can  be  displayed  for  the  active  sensor 
nodes,  indicated  by  filled  triangles. 

Contamination  zones  from  down  wind  leakage  and 
upwind  danger  zones  can  be  plotted  for  all  square  site 
nodes  (bright  green  when  they  are  active).  The  diagonal 
purple  lines  are  the  recommended  evacuation  routes. 

3.  HPC  Implementation  of  Data  Generation 
for  Nomografs 

The  Nunn-Lugar-Domenici  Domestic  Preparedness 
Program'201  initially  identified  one  hundred  and  twenty 
cities  as  likely  terrorist  targets.  This  number  has  since 
been  increased  to  over  150.  Military  installations  and 
other  potential  targets  further  increase  this  number.  For 
each  of  these  cities  and  installations,  nomograf  data  has  to 
be  generated  for  each  of  eighteen  wind  angles  and  up  to 
four  environmental  conditions.  Each  combination  of 
wind  angle  and  environmental  condition  requires  a 
separate  CFD  run  to  obtain  the  required  data  for  the 
corresponding  nomograf.  Clearly,  this  results  in  a large 
number  of  independent  runs  that  will  be  required.  In 
addition,  the  city  can  also  be  divided  up  into  a number  of 
distinct  tiles.  When  treated  separately,  these  tiles  lead  to 
even  more  independent  CFD  runs  than  can  be  executed  in 
parallel. 

In  performing  production  runs  to  develop  the 
necessary  Dispersion  Nomograf  data  sets,  a number  of 
levels  of  parallelism  can  be  exploited  to  optimize  the 
processing  and  permit  the  data  sets  to  be  developed  in  a 
rational  sequence,  working  from  the  center  of  the  city 
outward.  Because  of  these  multiple  levels  of  parallelism, 
the  version  of  the  model  (and  the  parallel  architecture) 
used  for  each  case  only  needs  to  be  moderately  scalable. 
Measures  and  metrics  such  as  gigaflops,  parallel-speed- 
up,  etc.,  are  largely  academic.  The  key,  and  arguably  the 
only  metric  of  importance,  is  the  overall  time  to  solution. 
We  must  focus  our  efforts  on  reducing  the  time  required 
to  put  in  place  a robust  and  accurate  system  for  protecting 
our  cities  and  military  installations. 

In  order  to  generate  all  the  data  to  cover  150  cities 
will  require  many  independent  large,  but  not  huge,  shared 
memory  parallel  jobs.  In  this  application  the  scalability  of 
individual  jobs  is  not  so  critical,  and  an  effective 
architecture  would  be  a cluster  of  shared  memory  nodes, 
each  with  16-32  processors.  This  problem  is  limited 
currently  by  processor  speed  and  memory  bandwidth, 
especially  between  processors  on  each  node.  In  this  case, 
other  than  for  file  management  and  subsequent  nomograf 
generation,  communication  within  the  cluster  is 
negligible.  Such  computers  are  readily  available  today. 

Special  cases  will  arise  that  need  to  be  considered  for 
detailed  analysis.  These  studies  may  involve  higher 
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resolution  and  include  much  more  detailed  and  complex 
physics,  especially  if  the  agent  is  known  and  has 
complicated  physics  or  chemistry.  Agent  fate  and 
deposition  may  have  to  be  considered.  These  cases  will 
be  far  fewer  in  number  but  will  be  required  due  to  special 
circumstances  and  will  usually  be  time-critical.  The 
individual  runs  for  these  cases  must  be  completed  as 
expeditiously  as  possible.  Here,  massive  scalability 
becomes  much  more  important.  For  the  shared  memory- 
based  FAST3D-CT  code,  the  ideal  computer  would  have 
flat  memory  access  from  all  processors.  In  the  long 
range,  it  may  be  required  to  re-develop  an  up-to-date 
distributed  memory  version  of  FAST3D-CT. 

A.  The  Current  HPC  Implementation  of 
FAST3D-CT 

Optimizing  the  sub-models  in  FAST3D-CT  requires 
a number  of  different  data  structures  for  distributed 
memory  parallel  systems.  Detailed  simulations  of 
buoyant  and  neutral  gas  contaminants,  multi-group 
droplet  problems,  and  multi-group  particle  sources  for 
biological  and  radiological  (“dirty  bomb”)  scenarios  for 
example  cause  severe  load-balancing  problems  for 
realistic  problems  in  which  much  of  the  grid  may  have  no 
contaminants  at  all.  To  provide  a high  degree  of  fidelity 
in  solar  deposition,  a ray-trace  algorithm  was 
implemented  so  buildings  and  trees  could  cast  realistic 
shadows.  To  keep  this  cost  to  a few  percent  of  the  overall 
running  time,  this  piece  of  complex  geometry  physics  was 
knowingly  implemented  in  a manner  only  conducive  to  a 
shared-memory  implementation. 

The  parallelization  strategy  adopted  in  FAST3D-CT 
is  essentially  loop-level  parallelism  controlled  by 
OpenMP  directives.  However,  these  directives  were 
placed  such  that  the  parallel  regions  were  extended  to 
encompass  multiple  loops.  The  outermost  of  the  three- 
dimensional  loops  was  parallelized  if  at  all  possible.  The 
main  computational  kernel,  LCPFCT5'51  was  placed  in 
parallel  loops  so  that  the  overhead  of  parallelization  is 
minimal.  The  limiting  factor  to  parallel  speed-up  in  the 
current  implementation  of  FAST3D-CT  is  non-local 
memory  access.  While  this  can  and  is  minimized  by  the 
first-touch  strategy  that  we  have  adopted,  the  directionally 
split  scheme  used  for  fluid  and  contaminant  transport 
effectively  requires  a partial  transpose  of  the  data  at  each 
timestep.  Though  a transpose  is  not  used  explicitly,  data 
must  be  accessed  with  non-unity  strides  prior  to 
performing  the  y-  and  z-direction  integrations.  The  long 
strides  required  in  the  z-direction  integrations  results  in 
cache  line  and  translation  lookup  buffer  misses,  leading  to 
poor  performance.  Performing  a partial  transpose 
explicitly  for  the  z-direction  integrations  alleviates  this. 
With  this  extra  step,  translation  lookup  buffer  misses  are 
completely  eliminated. 


The  use  of  OpenMP  does  limit  the  parallel  speedup 
possible  in  the  FAST3D-CT  code.  OpenMP  was  selected 
because  of  the  lower  programming  effort  required  due  to 
the  simpler  program  structure.  This  in  turn  eased 
debugging  and  allowed  rapid  insertion  and  testing  of  new 
physics  models  and  algorithms.  As  was  explained  above, 
there  is  also  a hierarchy  of  levels  of  parallelism  in  this 
problem  that  suggests  optimal  performance  will  result 
from  exploiting  only  a modest  level  of  parallelism  for 
each  run  but  executing  the  many  required  runs  in  parallel. 

B.  Future  Direction 

In  the  beginning  of  this  section,  we  discussed  the 
high  performance  computing  requirements  to  develop 
nomograf  representations  for  all  the  regions  where  CT- 
Analyst  application  could  be  necessary.  This  is  the  single 
largest  cost  and  therefore  the  main  deterrent  to  wider  use 
of  CT-Analyst.  Reducing  the  computer  requirements  and 
thus  the  time  delay  to  implement  CT-Analyst  is  very 
important  and  thus  the  subject  of  continuing  research  and 
development.  To  this  end  we  have  been  planning  a 
turnkey  nomograf  generation  system  that  can  be  run  by 
relatively  untrained  personnel  (not  PhDs)  on  smaller  HPC 
systems  already  present  at  a number  of  sites.  The  four 
stages  in  computing  nomografs  described  above  become 
four  coupled  processes  in  a single  computer  system.  The 
most  difficult  of  these  four  stages  to  automate  is  the  first; 
an  automatic  software  system  that  can  prepare  the 
geometry  database  required  for  the  detailed  LES 
simulations  needed  to  prepare  nomografs.  Graphical  tools 
will  help  but  the  system  will  still  have  to  be  keyed  to  one 
or  two  sources  of  geometry  information  such  as  LIDAR 
in  standard  formats. 

Meanwhile,  work  continues  on  improving  the  speed 
of  the  FAST3D-CT  computations  themselves  and  a factor 
of  two  seems  possible  with  no  loss  of  computer  accuracy 
for  the  second  stage  of  the  process.  The  LES  runs  can 
certainly  be  packaged  and  controlled  by  a graphical  user 
interface  to  generate  the  eighteen  input  data  streams, 
manage  the  data  collected  from  the  runs,  and  inform  on 
progress.  This  same  controller  can  also  pipe  the 
intermediate  as  well  as  final  results  into  the  nomograf 
generation  software  so  that  useable  dispersion  nomografs 
can  be  made  accessible  even  before  the  longest  3D 
simulations  are  complete.  Once  a candidate  nomograf 
data  set  has  been  produced,  the  turnkey  system  would 
automatically  over-write  earlier  (lower-fidelity) 
approximations. 

4.  Conclusion 

Physically  realistic  urban  aerodynamics  simulations 
are  now  possible  but  still  require  some  compromises  due 


124 


to  time,  computer,  and  manpower  resource  limitations. 
The  necessary  trade-offs  result  in  sometimes  using 
simpler  models,  algorithms,  and  geometry  representations 
than  we  would  wish.  We  have  shown-1'161  that  the 
building  and  large-scale  fluid  dynamics  effects  that  can  be 
captured  presently  govern  the  turbulent  dispersion,  and 
thus  expect  that  the  computed  predictions  will  get  better 
in  time  because  the  MILES  methodology  is  convergent 
when  computational  resolution  can  be  improved.  Inherent 
uncertainties  in  simulation  inputs  and  model  parameters 
beyond  the  environmental  conditions  also  lead  to  errors 
that  need  to  be  further  quantified  by  comparison  with  high 
quality  reference  data. 

Using  this  HPC-based  LES  model  as  a detailed 
scenario  generator,  we  have  invented  a process  to  make 
the  3D  CFD  really  useful  in  real  time  for  crisis  managers 
in  operational  situations.  As  a bottom  line,  the  increased 
speed  and  accuracy  of  using  dispersion  nomografs  in  CT- 
Analyst  can  reduce  the  number  of  people  being  exposed 
in  urban  CT  scenarios,  even  for  large  crowds  in  the  open, 
by  85  to  95%  once  an  effective  sensor  or  reporting 
network  is  in  place.  First  responders  and  headquarters 
staff  can  use  this  tool  for  data  fusion  to  give  a minute-by- 
minute  situation  assessment.  By  performing  all  the  major 
computing  ahead  of  time  on  HPC  computers  well  suited 
to  the  application,  the  results  of  a number  of  complete, 
high-resolution  3D  simulations  can  be  recalled  for 
operational  usage  with  no  sensible  delay  for  integration  of 
even  simple  models.  Ln  this  way  we  have  solved  the  usual 
dilemma  of  more  computer  time  being  required  to  obtain 
better  answers. 
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Figure  1.  Contaminant  dispersion  from  an 
instantaneous  release  in  Times  Square,  New  York  City 
as  predicted  by  the  FAST3D-CT.  The  frames  show 
concentrations  at  3,  5,  7,  and  1 5 minutes  after  release. 
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Figure  2.  CT-Analyst  display  for  downtown  Houston 
showing  contaminant  concentration  contours  (yellow, 
green,  and  blue),  contamination  footprint  (grey),  and 
evacuation  routes  (magenta/purple  lines).  Backtrack 
region  is  shown  in  dark  blue. 
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